15 research outputs found
Reducing communication in sparse solvers
Sparse matrix operations dominate the cost of many scientific applications. In parallel, the performance and scalability of these operations is limited by irregular point-to-point communication. Multiple methods are investigated throughout this dissertation for reducing the cost associated with communication throughout sparse matrix operations. Algorithmic changes reduce communication requirements, but also affect accuracy of the operation, leading to reduced convergence of scientific codes. We investigate a method of systematically removing relatively small non-zeros throughout an algebraic multigrid hierarchy, yielding significant reductions to the cost of sparse matrix-vector multiplication that outweigh affects of reduced accuracy of the multiplication. Therefore, the reduction in per-iteration communication costs outweigh the cost of extra solver iterations. As a result, sparsification yields improvement of both the performance and scalability of algebraic multigrid.
Alterations to the parallel implementation of MPI communication also yield reduced costs with no effect on accuracy. We investigate methods of agglomerating messages on-node before injecting into the network, reducing the amount of costly inter-node communication. This node-aware communication yields improvements to both performance and scalability of matrix operations, particularly in strong scaling studies. Furthermore, we show an improvement in the cost of algebraic multigrid as a result of reduced communication costs in sparse matrix operations.
Finally, performance models can be used to analyze the costs of matrix operations, indicating the source of dominant communication costs, such as initializing messages or transporting bytes of data. We investigate methods of improving traditional performance models of irregular point-to-point communication through the addition of node-awareness, queue search costs, and network contention penalties
MPI Advance : Open-Source Message Passing Optimizations
The large variety of production implementations of the message passing
interface (MPI) each provide unique and varying underlying algorithms. Each
emerging supercomputer supports one or a small number of system MPI
installations, tuned for the given architecture. Performance varies with MPI
version, but application programmers are typically unable to achieve optimal
performance with local MPI installations and therefore rely on whichever
implementation is provided as a system install. This paper presents MPI
Advance, a collection of libraries that sit on top of MPI, optimizing the
underlying performance of any existing MPI library. The libraries provide
optimizations for collectives, neighborhood collectives, partitioned
communication, and GPU-aware communication.Comment: Available on conference website :
https://eurompi23.github.io/assets/papers/EuroMPI23_paper_33.pd
Optimizing Irregular Communication with Neighborhood Collectives and Locality-Aware Parallelism
Irregular communication often limits both the performance and scalability of
parallel applications. Typically, applications individually implement irregular
messages using point-to-point communications, and any optimizations are added
directly into the application. As a result, these optimizations lack
portability. There is no easy way to optimize point-to-point messages within
MPI, as the interface for single messages provides no information on the
collection of all communication to be performed. However, the persistent
neighbor collective API, released in the MPI 4 standard, provides an interface
for portable optimizations of irregular communication within MPI libraries.
This paper presents methods for optimizing irregular communication within
neighborhood collectives, analyzes the impact of replacing point-to-point
communication in existing codebases such as Hypre BoomerAMG with neighborhood
collectives, and finally shows an up to 1.32x speedup on sparse matrix-vector
multiplication within a BoomerAMG solve through the use of our optimized
neighbor collectives. The authors analyze multiple implementations of
neighborhood collectives, including a standard implementation, which simply
wraps standard point-to-point communication, as well as multiple
implementations of locality-aware aggregation. All optimizations are available
in an open-source codebase, MPI Advance, which sits on top of MPI, allowing for
optimizations to be added into existing codebases regardless of the system MPI
install
Collective-Optimized FFTs
This paper measures the impact of the various alltoallv methods. Results are
analyzed within Beatnik, a Z-model solver that is bottlenecked by HeFFTe and
representative of applications that rely on FFTs
Targets of wnt/ß-catenin transcription in penile carcinoma
Penile squamous cell carcinoma (PeCa) is a rare malignancy and little is known regarding the molecular mechanisms involved in carcinogenesis of PeCa. The Wnt signaling pathway, with the transcription activator ß-catenin as a major transducer, is a key cellular pathway during development and in disease, particularly cancer. We have used PeCa tissue arrays and multi-fluorophore labelled, quantitative, immunohistochemistry to interrogate the expression of WNT4, a Wnt ligand, and three targets of Wnt-ß-catenin transcription activation, namely, MMP7, cyclinD1 (CD1) and c-MYC in 141 penile tissue cores from 101 unique samples. The expression of all Wnt signaling proteins tested was increased by 1.6 to 3 fold in PeCa samples compared to control tissue (normal or cancer adjacent) samples (p<0.01). Expression of all proteins, except CD1, showed a significant decrease in grade II compared to grade I tumors. High magnification, deconvolved confocal images were used to measure differences in co-localization between the four proteins. Significant (p<0.04-0.0001) differences were observed for various permutations of the combinations of proteins and state of the tissue (control, tumor grades I and II). Wnt signaling may play an important role in PeCa and proteins of the Wnt signaling network could be useful targets for diagnosis and prognostic stratification of disease